10. Questions for a Dataset

Let's use pandas to take a look at the data! Run the cells in the Jupyter Notebook below. What are good questions you can ask based on this information? (There is more information about the columns in the dataset below the Jupyter Notebook.)

Workspace

This section contains either a workspace (it can be a Jupyter Notebook workspace or an online code editor work space, etc.) and it cannot be automatically downloaded to be generated here. Please access the classroom with your account and manually download the workspace to your local machine. Note that for some courses, Udacity upload the workspace files onto https://github.com/udacity , so you may be able to download them there.

Workspace Information:

  • Default file path:
  • Workspace type: jupyter
  • Opened files (when workspace is loaded): n/a

Breast Cancer Wisconsin (Diagnostic) Dataset from UCI Machine Learning Lab

(The dataset is included in the workspace here for you as "cancer_data.csv." If you're interested, you can explore it further here, on Kaggle .)

Attribute Information:

  1. ID number
  2. Diagnosis (M = malignant, B = benign)
  3. 30 features

The following ten features are computed for each cell nucleus. For each of these ten features, a column is created for the mean, standard error, and max value.

Feature Description
Radius Mean of distances from center to points on the perimeter
Texture Standard deviation of gray-scale values
Perimeter
Area
Smoothness Local variation in radius lengths
Compactness Perimeter 2 / Area - 1.0
Concavity Severity of concave portions of the contour
Concave Points Number of concave portions of the contour
Symmetry
Fractal Dimension "Coastline approximation" - 1

QUESTION:

What questions would you ask?

ANSWER:

How would you go about answering these questions? Which parts of this dataset might you use for each one?